-
Notifications
You must be signed in to change notification settings - Fork 234
C# binding into STFT pre-processing #1843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nxruntime-genai into nebanfic/stft-genai
| namespace Microsoft.ML.OnnxRuntimeGenAI | ||
| { | ||
| public static class SignalProcessor | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these APIs be added to the MultiModalProcessor instead of introducing a new class? All multi-modal models (e.g. Whisper, Phi-4 mm, etc) use that processor for pre-processing.
|
|
||
| [DllImport(NativeLib.DllName, CallingConvention = CallingConvention.Winapi)] | ||
| public static extern int /* extError_t */ OgaSplitSignalSegments( | ||
| IntPtr /* const OgaTensor* */ input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow the same tab spacings as the other native method APIs listed in this file for the new APIs?
| } | ||
|
|
||
| template <typename T> | ||
| OrtxTensor* MakeOrtxTensor(Generators::Tensor* src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Generators namespace is included at the top, I think you can remove the Generators:: part before each Tensor.
| namespace Generators { |
| const OgaTensor* hop_ms_tensor, | ||
| const OgaTensor* energy_threshold_db_tensor, | ||
| OgaTensor* output0) { | ||
| OGA_TRY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add unit tests for the new APIs in ORT GenAI's C API tests?
| */ | ||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaProcessorProcessImagesAndAudiosAndPrompts(const OgaMultiModalProcessor*, const OgaStringArray* prompts, const OgaImages* images, const OgaAudios* audios, OgaNamedTensors** input_tensors); | ||
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaSplitSignalSegments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add C# API tests that call these C APIs?
| const OgaTensor* energy_threshold_db_tensor, | ||
| OgaTensor* output0); | ||
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add comments above these new APIs to explain their usage and their parameters.
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments( | ||
| const OgaTensor* segments_tensor, | ||
| const OgaTensor* merge_gap_ms_tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the type of the object is an OgaTensor*, I think we can remove the _tensor suffix from the parameter names.
| reinterpret_cast<const Generators::Tensor*>(input), | ||
| reinterpret_cast<const Generators::Tensor*>(sr_tensor), | ||
| reinterpret_cast<const Generators::Tensor*>(frame_ms_tensor), | ||
| reinterpret_cast<const Generators::Tensor*>(hop_ms_tensor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To set the return value to output0 in a way that does not cause memory corruptions, you may need to use ReturnShared<OgaTensor> and change output0 to OgaTensor**.
onnxruntime-genai/src/ort_genai_c.cpp
Lines 632 to 638 in cac5438
| OgaResult* OGA_API_CALL OgaTokenizerEncodeBatch(const OgaTokenizer* tokenizer, const char** strings, size_t count, OgaTensor** out) { | |
| OGA_TRY | |
| auto tensor = tokenizer->EncodeBatch(std::span<const char*>(strings, count)); | |
| *out = ReturnShared<OgaTensor>(tensor); | |
| return nullptr; | |
| OGA_CATCH | |
| } |
Let's also rename output0 to out since there's just one output and to be consistent with existing API conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't cause memory corruption anymore, code I have here now is working correctly, so I wouldn't mess up again with changing it.
| OgaTensor* output0) { | ||
| OGA_TRY | ||
| return reinterpret_cast<OgaResult*>( | ||
| Generators::MergeSignalSegments( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move SplitSignalSegments and MergeSignalSegments inside a processor object. The outputs from those methods can then be set to the output tensors from the processor APIs.
auto tensor = processor->SplitSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);auto tensor = processor->MergeSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);Then these new APIs could be called similar to how methods such as OgaTokenizerEncodeBatch look in the above review comment. This will also help add future support for these new APIs in other language bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would I have to construct a new object which has no connection with my current functionality? I think what we need to do here is to clarify, do we even need a Processor class, and if we do, what exactly should be inside it, because I see many methods not being a part of the class (e.g. ProcessTensor is not a part of the class but is within the same file).
I can do this, but we really need to clarify this.
| { | ||
| long[] inputShape = new long[] { 1, inputSignal.Length }; | ||
|
|
||
| input = CreateFloatTensorFromArray(inputSignal, inputShape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you construct a Tensor object directly for the input data? Each Tensor object has its own IntPtr handle that you can provide for the NativeMethods.CAPI call.
onnxruntime-genai/src/csharp/Tensor.cs
Lines 28 to 45 in 9d3631d
| public class Tensor : IDisposable | |
| { | |
| private IntPtr _tensorHandle; | |
| private bool _disposed = false; | |
| public Tensor(IntPtr data, Int64[] shape, ElementType type) | |
| { | |
| Result.VerifySuccess(NativeMethods.OgaCreateTensorFromBuffer(data, shape, (UIntPtr)shape.Length, type, out _tensorHandle)); | |
| } | |
| internal Tensor(IntPtr tensorHandle) | |
| { | |
| Debug.Assert(tensorHandle != IntPtr.Zero); | |
| _tensorHandle = tensorHandle; | |
| _disposed = false; | |
| } | |
| internal IntPtr Handle { get { return _tensorHandle; } } |
The Tensor class can help with memory management during disposal.
Here is an example with Tensor from a unit test.
onnxruntime-genai/test/csharp/TestOnnxRuntimeGenAIAPI.cs
Lines 637 to 669 in 9d3631d
| public void TestTensorAndAddExtraInput() | |
| { | |
| string modelPath = _tinyRandomGpt2ModelPath; | |
| using var model = new Model(modelPath); | |
| Assert.NotNull(model); | |
| using var generatorParams = new GeneratorParams(model); | |
| Assert.NotNull(generatorParams); | |
| float[] data = { 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24 }; | |
| long[] shape = { 3, 5 }; | |
| // Pin the array to get its pointer | |
| GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned); | |
| try | |
| { | |
| IntPtr data_pointer = handle.AddrOfPinnedObject(); | |
| using var tensor = new Tensor(data_pointer, shape, ElementType.float32); | |
| Assert.NotNull(tensor); | |
| Assert.Equal(shape, tensor.Shape()); | |
| Assert.Equal(ElementType.float32, tensor.Type()); | |
| using var generator = new Generator(model, generatorParams); | |
| Assert.NotNull(generator); | |
| generator.SetModelInput("test_input", tensor); | |
| } | |
| finally | |
| { | |
| handle.Free(); | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this approach gave me memory corruption as well, there might be an issue with Tensor class, but I can give it another try.
| const OrtxTensor* frame_tensor = Generators::MakeOrtxTensorConst<int64_t>(frame_ms_tensor); | ||
| const OrtxTensor* hop_tensor = Generators::MakeOrtxTensorConst<int64_t>(hop_ms_tensor); | ||
| const OrtxTensor* thr_tensor = Generators::MakeOrtxTensorConst<float>(energy_threshold_db_tensor); | ||
| OrtxTensor* out_tensor = Generators::MakeOrtxTensor<int64_t>(output0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend looking at how the tokenizer and processor APIs are currently implemented (ex: Process method in the Whisper processor). That will help with making the implementations of the two methods similar to existing implementations.
-
You can use
CheckResultin place of theerrcheck. -
Are tensors really needed for most of the parameters passed to
OrtxSplitSignalSegments? The sampling rate (sr_tensor), frame size (frame_ms), hop length (hop_ms), and energy threshold (energy_threshold_db) are singular values. Can we modifyOrtxSplitSignalSegmentsto use primitive types so that primitive types can be used in ORT GenAI for the "language binding API --> ORT GenAI C API --> ORT GenAI C++ API" flow? That will also greatly clean up the need for all of the workarounds you have in the C# bindings and in this file. -
If
OrtxSplitSignalSegmentsis updated, you can have the output stored asOrtxTensorResult** resultinside that extensions API. Then you can create the output tensor inside this method instead of passing it in as a parameter toSplitSignalSegments. Here is an example.
onnxruntime-genai/src/models/whisper_processor.cpp
Lines 27 to 31 in 9d3631d
| ort_extensions::OrtxObjectPtr<OrtxTensorResult> result; | |
| CheckResult(OrtxFeatureExtraction(processor_.get(), audios->audios_.get(), result.ToBeAssigned())); | |
| ort_extensions::OrtxObjectPtr<OrtxTensor> mel; | |
| CheckResult(OrtxTensorResultGetAt(result.get(), 0, mel.ToBeAssigned())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Its more consistant having tensors, but I can do your approach as well, I have no preference, I was just trying to be consistant + I haven't seen built-in types used for Oga API.
-
It is possible, yes, but since I am initializing tensors on C# side, again for consistancy and simplicity (no need for **), I decided to create all on the caller side. But the extension should, in theory, work for both cases.
|
Thanks for reviewing the PR @kunal-vaishnavi , I am currently still testing the changes from here with Audio Streaming, and will for now only address the comments that are fast to resolve, while others will follow later. |
No description provided.